智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

DDXPlus: A New Dataset For Automatic Medical Diagnosis

Arsene Fansi Tchango , Rishab Goel , Zhi Wen , Julien Martel , Joumana Ghosn

分类：自然语言处理 | 人工智能 | 机器学习

2022-05-18

机器学习研究文献中，人们对自动症状检测（ASD）和自动诊断（AD）系统的兴趣迅速增强，旨在帮助远程医疗服务的医生。这些系统旨在与患者相互作用，收集有关其症状和相关前因的证据，并可能对潜在疾病做出预测。医生将审查互动，包括证据和预测，如有必要，请在确定下一步之前从患者那里收集其他信息。尽管该领域最近取得了进展，但这些系统的设计中缺少重要的医生与患者的互动，即鉴别诊断。它的缺席很大程度上是由于缺乏包含此类信息供模型进行训练的数据集。在这项工作中，我们为每个患者提供了一个大约130万患者的大规模合成数据集，其中包括鉴别诊断以及地面真理病理，症状和前因。与仅包含二进制症状和先例的现有数据集不同，该数据集还包含分类和多选择症状以及对有效数据收集有用的先决条件。此外，某些症状是在层次结构中组织的，使设计系统可以以逻辑方式与患者互动。作为概念验证，我们扩展了两个现有的AD和ASD系统以结合差异诊断，并提供了经验证据，表明将差异作为训练信号对于此类系统的效率至关重要。该数据集可在\ href {https://figshare.com/articles/dataset/ddxplus_dataset/20043374} {https://figshare.com/articles/articles/articles/dataaset/ddxplus/ddxplus/ddxplus/ddataset/2004343434343433344}。

translated by 谷歌翻译

CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images

Axel Levy , Frédéric Poitevin , Julien Martel , Youssef Nashed , Ariana Peck , Nina Miolane , Daniel Ratner , Mike Dunne , Gordon Wetzstein

分类：计算机视觉 | 机器学习

2022-03-15

冷冻电子显微镜（Cryo-EM）已成为结构生物学中基本重要性的工具，帮助我们了解生活的基本构建基础。冷冻EM的算法挑战是共同估计未知的3D姿势和来自数百万个极其嘈杂的2D图像的生物分子的3D电子散射潜力。但是，由于其高度计算和内存成本，现有的重建算法无法轻易地与迅速增长的低温EM数据集尺寸保持同步。我们介绍了Cryoai，这是一种用于均匀构象的从头算重建算法，该构型使用基于直接梯度的粒子姿势优化和来自单粒子冷冻EM数据的电子散射电位。冷冻ai结合了一个学识渊博的编码器，该编码器将每个粒子图像的姿势与基于物理的解码器进行汇总，以将每个粒子图像汇总到散射势体积的隐式表示中。该卷存储在傅立叶域中以提高计算效率，并利用现代坐标网络体系结构来提高内存效率。结合对称损耗函数，该框架可在模拟和实验数据中与最先进的冷冻EM求解器达到质量的结果，对于大型数据集而言，一个数量级的阶数级，并且具有明显低的存储器需求现有方法。

translated by 谷歌翻译

MantissaCam: Learning Snapshot High-dynamic-range Imaging with Perceptually-based In-pixel Irradiance Encoding

Haley M. So , Julien N. P. Martel , Piotr Dudek , Gordon Wetzstein

分类：计算机视觉

2021-12-09

在许多计算机视觉应用程序中，对高动态范围（HDR）场景的能力至关重要。然而，传统传感器的动态范围基本上受其井容量的限制，导致明亮场景部件的饱和度。为了克服这种限制，新兴传感器提供了用于编码入射辐照度的像素处理能力。在最有前途的编码方案中，模数包装，其导致计算机拍摄场景由来自包裹的低动态（LDR）传感器图像的辐照法展开算法计算的计算摄影问题。在这里，我们设计了一种基于神经网络的算法，优于先前的辐照度展示方法，更重要的是，我们设计了一种感知的激发灵感的“螳螂”编码方案，从而更有效地将HDR场景包装到LDR传感器中。结合我们的重建框架，Mantissacam在模型快照HDR成像方法中实现了最先进的结果。我们展示了我们在模拟中的效果，并显示了用可编程传感器实现的原型尾涂的初步结果。

translated by 谷歌翻译

Common Limitations of Image Processing Metrics: A Picture Story

Annika Reinke , Minu D. Tizabi , Carole H. Sudre , Matthias Eisenmann , Tim Rädsch , Michael Baumgartner , Laura Acion , Michela Antonelli , Tal Arbel , Spyridon Bakas

分类：计算机视觉

2021-04-12

尽管自动图像分析的重要性不断增加，但最近的元研究揭示了有关算法验证的主要缺陷。性能指标对于使用的自动算法的有意义，客观和透明的性能评估和验证尤其是关键，但是在使用特定的指标进行给定的图像分析任务时，对实际陷阱的关注相对较少。这些通常与（1）无视固有的度量属性，例如在存在类不平衡或小目标结构的情况下的行为，（2）无视固有的数据集属性，例如测试的非独立性案例和（3）无视指标应反映的实际生物医学领域的兴趣。该动态文档的目的是说明图像分析领域通常应用的性能指标的重要局限性。在这种情况下，它重点介绍了可以用作图像级分类，语义分割，实例分割或对象检测任务的生物医学图像分析问题。当前版本是基于由全球60多家机构的国际图像分析专家进行的关于指标的Delphi流程。

translated by 谷歌翻译

Implicit Neural Representations with Periodic Activation Functions

Vincent Sitzmann , Julien N. P. Martel , Alexander W. Bergman , David B. Lindell , Gordon Wetzstein

分类：

2020-06-17

Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIRENs, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIRENs can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIRENs with hypernetworks to learn priors over the space of SIREN functions. Please see the project website for a video overview of the proposed method and all applications.

translated by 谷歌翻译

Event Based, Near Eye Gaze Tracking Beyond 10,000Hz

Anastasios N. Angelopoulos , Julien N. P. Martel , Amit P. S. Kohli , Jorg Conradt , Gordon Wetzstein

分类：计算机视觉

2020-04-07

现代目光跟踪系统中的相机具有基本的带宽和功率限制，实际上将数据采集速度限制为300 Hz。这会阻碍使用移动眼镜手术器的使用，例如低潜伏期预测性渲染，或者在野外使用头部安装的设备来快速而微妙的眼动运动，例如微扫视。在这里，我们提出了一个基于混合框架的近眼凝视跟踪系统，可提供超过10,000 Hz的更新速率，其准确性与在相同条件下评估时相匹配的高端台式机商业跟踪器。我们的系统建立在新兴事件摄像机的基础上，该摄像头同时获得定期采样框架和自适应采样事件。我们开发了一种在线2D学生拟合方法，该方法每一个或几个事件都会更新参数模型。此外，我们提出了一个多项式回归器，用于实时估算参数学生模型的凝视点。使用第一个基于事件的凝视数据集，可在https://github.com/aangelopoulos/event_based_gaze_tracking上获得，我们证明我们的系统可实现0.45度 - 1.75度的准确度，用于从45度到98度的视野。借助这项技术，我们希望能够为虚拟和增强现实提供新一代的超低延迟凝视呈现和展示技术。

translated by 谷歌翻译

Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals

Juan Vazquez-Rodriguez , Grégoire Lefebvre , Julien Cumin , James L Crowley

分类：人工智能 | 机器学习

2022-12-22

In this paper, we address the problem of multimodal emotion recognition from multiple physiological signals. We demonstrate that a Transformer-based approach is suitable for this task. In addition, we present how such models may be pretrained in a multimodal scenario to improve emotion recognition performances. We evaluate the benefits of using multimodal inputs and pre-training with our approach on a state-ofthe-art dataset.

translated by 谷歌翻译

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

Julien Denize , Jaonary Rabarisoa , Astrid Orcesi , Romain Hérault

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-21

Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive one that brings the positives closer and estimates a continuous distribution to push or pull negative instances based on their learned similarities. We validate empirically our approach on both image and video representation learning. We show that SCE performs competitively with the state of the art on the ImageNet linear evaluation protocol for fewer pretraining epochs and that it generalizes to several downstream image tasks. We also show that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks.

translated by 谷歌翻译

Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks

Gaël Poux-Médard , Julien Velcin , Sabine Loudcher

分类：机器学习

2022-12-12

Information spread on networks can be efficiently modeled by considering three features: documents' content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as \textit{(time of publication, information's content, spreading entity)} as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.

translated by 谷歌翻译